Construct Single-Hierarchical P/NBD Model for Online Retail Transaction Data
In this workbook we construct the non-hierarchical P/NBD models on the synthetic data with the longer timeframe.
1 Load and Construct Datasets
We start by modelling the P/NBD model using our synthetic datasets before we try to model real-life data.
1.1 Load Online Retail Data
We now want to load the online retail transaction data.
Show code
customer_cohortdata_tbl <- read_rds("data/onlineretail_cohort_tbl.rds")
customer_cohortdata_tbl |> glimpse()Rows: 5,852
Columns: 5
$ customer_id <chr> "12346", "12347", "12348", "12349", "12350", "12351", …
$ cohort_qtr <chr> "2010 Q1", "2010 Q4", "2010 Q3", "2010 Q2", "2011 Q1",…
$ cohort_ym <chr> "2010 03", "2010 10", "2010 09", "2010 04", "2011 02",…
$ first_tnx_date <date> 2010-03-02, 2010-10-31, 2010-09-27, 2010-04-29, 2011-…
$ total_tnx_count <int> 3, 8, 5, 3, 1, 1, 9, 2, 1, 2, 6, 2, 5, 10, 6, 4, 10, 2…
Show code
customer_transactions_tbl <- read_rds("data/onlineretail_transactions_tbl.rds")
customer_transactions_tbl |> glimpse()Rows: 36,658
Columns: 4
$ tnx_timestamp <dttm> 2009-12-01 07:45:00, 2009-12-01 07:45:59, 2009-12-01 09…
$ customer_id <chr> "13085", "13085", "13078", "15362", "18102", "12682", "1…
$ invoice_id <chr> "489434", "489435", "489436", "489437", "489438", "48943…
$ tnx_amount <dbl> 505.30, 145.80, 630.33, 310.75, 2286.24, 372.30, 50.40, …
Show code
customer_subset_id <- read_rds("data/onlineretail_customer_subset_ids.rds")
customer_subset_id |> glimpse() chr [1:2000] "12347" "12348" "12349" "12355" "12357" "12359" "12360" ...
1.2 Load Derived Data
Show code
customer_summarystats_tbl <- read_rds("data/onlineretail_customer_summarystats_tbl.rds")
obs_fitdata_tbl <- read_rds("data/onlineretail_obs_fitdata_tbl.rds")
obs_validdata_tbl <- read_rds("data/onlineretail_obs_validdata_tbl.rds")
customer_fit_stats_tbl <- obs_fitdata_tbl |>
rename(x = tnx_count)1.3 Load Subset Data
We also want to construct our data subsets for the purposes of speeding up our valuations.
Show code
customer_fit_subset_tbl <- obs_fitdata_tbl |>
filter(customer_id %in% customer_subset_id)
customer_fit_subset_tbl |> glimpse()Rows: 2,000
Columns: 6
$ customer_id <chr> "12347", "12348", "12349", "12355", "12357", "12359", "…
$ first_tnx_date <dttm> 2010-10-31 14:19:59, 2010-09-27 14:58:59, 2010-04-29 1…
$ last_tnx_date <dttm> 2010-10-31 14:19:59, 2010-09-27 14:58:59, 2010-10-28 0…
$ tnx_count <dbl> 0, 0, 1, 0, 0, 5, 2, 2, 0, 0, 2, 2, 1, 1, 3, 0, 2, 4, 0…
$ t_x <dbl> 0.000000, 0.000000, 25.970536, 0.000000, 0.000000, 44.1…
$ T_cal <dbl> 4.3432540, 9.1965278, 30.7777778, 27.6429563, 2.0828373…
Show code
customer_valid_subset_tbl <- obs_validdata_tbl |>
filter(customer_id %in% customer_subset_id)
customer_valid_subset_tbl |> glimpse()Rows: 2,000
Columns: 3
$ customer_id <chr> "12347", "12348", "12349", "12355", "12357", "12359"…
$ tnx_count <dbl> 7, 4, 1, 1, 1, 4, 3, 1, 0, 0, 0, 1, 1, 2, 4, 0, 0, 1…
$ tnx_last_interval <dbl> 53.09444, 42.65010, 50.77292, 22.79653, 48.66736, 45…
We now use these datasets to set the start and end dates for our various validation methods.
Show code
dates_lst <- read_rds("data/onlineretail_simulation_dates.rds")
use_fit_start_date <- dates_lst$use_fit_start_date
use_fit_end_date <- dates_lst$use_fit_end_date
use_valid_start_date <- dates_lst$use_valid_start_date
use_valid_end_date <- dates_lst$use_valid_end_dateFinally, we need to set our directories where we save our Stan code and the model outputs.
Show code
stan_modeldir <- "stan_models"
stan_codedir <- "stan_code"2 Fit First Hierarchical Lambda Model
Our first hierarchical model puts a hierarchical prior around the mean of our population \(\lambda\) - lambda_mn.
Once again we use a Gamma prior for it.
2.1 Compile and Fit Stan Model
We now compile this model using CmdStanR.
Show code
pnbd_onehierlambda_stanmodel <- cmdstan_model(
"stan_code/pnbd_onehier_lambda.stan",
include_paths = stan_codedir,
pedantic = TRUE,
dir = stan_modeldir
)We then use this compiled model with our data to produce a fit of the data.
Show code
stan_modelname <- "pnbd_onlineretail_onehierlambda1"
stanfit_prefix <- str_c("fit_", stan_modelname)
stanfit_seed <- stanfit_seed + 1
stanfit_object_file <- glue("data/{stanfit_prefix}_stanfit.rds")
stan_data_lst <- customer_fit_stats_tbl |>
select(customer_id, x, t_x, T_cal) |>
compose_data(
hier_lambda_mn_p1 = 0.25,
hier_lambda_mn_p2 = 1,
lambda_cv = 1.00,
mu_mn = 0.10,
mu_cv = 1.00,
)
if(!file_exists(stanfit_object_file)) {
pnbd_onlineretail_onehierlambda1_stanfit <- pnbd_onehierlambda_stanmodel$sample(
data = stan_data_lst,
chains = 4,
iter_warmup = 500,
iter_sampling = 500,
seed = stanfit_seed,
save_warmup = TRUE,
output_dir = stan_modeldir,
output_basename = stanfit_prefix,
)
pnbd_onlineretail_onehierlambda1_stanfit$save_object(stanfit_object_file, compress = "gzip")
} else {
pnbd_onlineretail_onehierlambda1_stanfit <- read_rds(stanfit_object_file)
}
pnbd_onlineretail_onehierlambda1_stanfit$summary()# A tibble: 12,720 × 10
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <num> <num> <num> <num> <num> <num> <num> <num>
1 lp__ -6.18e+4 -6.18e+4 6.86e+1 6.62e+1 -6.19e+4 -6.17e+4 1.00 712.
2 lambda_mn 1.26e-1 1.26e-1 2.49e-3 2.43e-3 1.22e-1 1.30e-1 1.00 1204.
3 lambda[1] 5.60e-2 4.71e-2 4.06e-2 3.40e-2 9.59e-3 1.33e-1 1.00 1735.
4 lambda[2] 8.89e-2 6.23e-2 8.72e-2 6.00e-2 6.77e-3 2.65e-1 1.00 1989.
5 lambda[3] 8.02e-2 5.14e-2 8.54e-2 5.21e-2 4.59e-3 2.47e-1 0.999 1413.
6 lambda[4] 5.19e-2 4.33e-2 3.65e-2 3.20e-2 9.51e-3 1.21e-1 1.00 1696.
7 lambda[5] 1.25e-1 8.74e-2 1.24e-1 8.91e-2 6.49e-3 3.67e-1 1.00 1428.
8 lambda[6] 1.88e-1 1.56e-1 1.34e-1 1.11e-1 3.38e-2 4.46e-1 1.00 1623.
9 lambda[7] 8.71e-2 6.14e-2 8.97e-2 6.28e-2 3.84e-3 2.65e-1 1.00 1663.
10 lambda[8] 7.66e-2 4.68e-2 8.61e-2 5.01e-2 3.63e-3 2.43e-1 0.999 1901.
# ℹ 12,710 more rows
# ℹ 1 more variable: ess_tail <num>
We have some basic HMC-based validity statistics we can check.
Show code
pnbd_onlineretail_onehierlambda1_stanfit$cmdstan_diagnose()Processing csv files: /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehierlambda1-1.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehierlambda1-2.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehierlambda1-3.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehierlambda1-4.csvWarning: non-fatal error reading adaptation data
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
2.2 Visual Diagnostics of the Sample Validity
Now that we have a sample from the posterior distribution we need to create a few different visualisations of the diagnostics.
Show code
parameter_subset <- c(
"lambda_mn",
"lambda[1]", "lambda[2]", "lambda[3]", "lambda[4]",
"mu[1]", "mu[2]", "mu[3]", "mu[4]"
)
pnbd_onlineretail_onehierlambda1_stanfit$draws(inc_warmup = FALSE) |>
mcmc_trace(pars = parameter_subset) +
expand_limits(y = 0) +
labs(
x = "Iteration",
y = "Value",
title = "Traceplot of Sample of Lambda and Mu Values"
) +
theme(axis.text.x = element_text(size = 10))We also check \(N_{eff}\) as a quick diagnostic of the fit.
Show code
pnbd_onlineretail_onehierlambda1_stanfit |>
neff_ratio(pars = c("lambda", "mu")) |>
as.numeric() |>
mcmc_neff() +
ggtitle("Plot of Parameter Effective Sample Sizes")2.3 Assess the Model
As we intend to run the same logic to assess each of our models, we have combined all this logic into a single function run_model_assessment, to run the simulations and combine the datasets.
Show code
pnbd_stanfit <- pnbd_onlineretail_onehierlambda1_stanfit |>
recover_types(customer_fit_stats_tbl)
pnbd_onlineretail_onehierlambda1_assess_data_lst <- run_model_assessment(
model_stanfit = pnbd_stanfit,
insample_tbl = customer_fit_subset_tbl,
fit_label = "pnbd_onlineretail_onehierlambda1",
fit_end_dttm = use_fit_end_date |> as.POSIXct(),
valid_start_dttm = use_valid_start_date |> as.POSIXct(),
valid_end_dttm = use_valid_end_date |> as.POSIXct(),
sim_seed = 4210
)
pnbd_onlineretail_onehierlambda1_assess_data_lst |> glimpse()List of 5
$ model_fit_index_filepath : 'glue' chr "data/pnbd_onlineretail_onehierlambda1_assess_fit_index_tbl.rds"
$ model_valid_index_filepath : 'glue' chr "data/pnbd_onlineretail_onehierlambda1_assess_valid_index_tbl.rds"
$ model_simstats_filepath : 'glue' chr "data/pnbd_onlineretail_onehierlambda1_assess_model_simstats_tbl.rds"
$ model_fit_simstats_filepath : 'glue' chr "data/pnbd_onlineretail_onehierlambda1_assess_fit_simstats_tbl.rds"
$ model_valid_simstats_filepath: 'glue' chr "data/pnbd_onlineretail_onehierlambda1_assess_valid_simstats_tbl.rds"
2.3.1 Check In-Sample Data Validation
We first check the model against the in-sample data.
Show code
simdata_tbl <- pnbd_onlineretail_onehierlambda1_assess_data_lst |>
use_series(model_fit_simstats_filepath) |>
read_rds()
insample_plots_lst <- create_model_assessment_plots(
obsdata_tbl = obs_fitdata_tbl,
simdata_tbl = simdata_tbl
)
insample_plots_lst$multi_plot |> print()Show code
insample_plots_lst$total_plot |> print()Show code
insample_plots_lst$quant_plot |> print()This fit looks reasonable and appears to capture most of the aspects of the data used to fit it. Given that this is a synthetic dataset, this is not surprising, but at least we appreciate that our model is valid.
2.3.2 Check Out-of-Sample Data Validation
We now repeat for the out-of-sample data.
Show code
simdata_tbl <- pnbd_onlineretail_onehierlambda1_assess_data_lst |>
use_series(model_valid_simstats_filepath) |>
read_rds()
outsample_plots_lst <- create_model_assessment_plots(
obsdata_tbl = obs_validdata_tbl,
simdata_tbl = simdata_tbl
)
outsample_plots_lst$multi_plot |> print()Show code
outsample_plots_lst$total_plot |> print()Show code
outsample_plots_lst$quant_plot |> print()As for our short time frame data, overall our model is working well.
3 Fit Second Hierarchical Lambda Model
In this model, we are going with a broadly similar model but we are instead using a different mean for our hierarchical prior.
3.1 Fit Stan Model
We now want to fit the model to our data using this alternative prior for lambda_mn.
Show code
stan_modelname <- "pnbd_onlineretail_onehierlambda2"
stanfit_prefix <- str_c("fit_", stan_modelname)
stanfit_seed <- stanfit_seed + 1
stanfit_object_file <- glue("data/{stanfit_prefix}_stanfit.rds")
stan_data_lst <- customer_fit_stats_tbl |>
select(customer_id, x, t_x, T_cal) |>
compose_data(
hier_lambda_mn_p1 = 0.50,
hier_lambda_mn_p2 = 1,
lambda_cv = 1.00,
mu_mn = 0.10,
mu_cv = 1.00,
)
if(!file_exists(stanfit_object_file)) {
pnbd_onlineretail_onehierlambda2_stanfit <- pnbd_onehierlambda_stanmodel$sample(
data = stan_data_lst,
chains = 4,
iter_warmup = 500,
iter_sampling = 500,
seed = stanfit_seed,
save_warmup = TRUE,
output_dir = stan_modeldir,
output_basename = stanfit_prefix,
)
pnbd_onlineretail_onehierlambda2_stanfit$save_object(stanfit_object_file, compress = "gzip")
} else {
pnbd_onlineretail_onehierlambda2_stanfit <- read_rds(stanfit_object_file)
}
pnbd_onlineretail_onehierlambda2_stanfit$summary()# A tibble: 12,720 × 10
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <num> <num> <num> <num> <num> <num> <num> <num>
1 lp__ -6.18e+4 -6.18e+4 7.42e+1 7.47e+1 -6.19e+4 -6.17e+4 1.00 539.
2 lambda_mn 1.26e-1 1.26e-1 2.52e-3 2.45e-3 1.22e-1 1.30e-1 1.00 1534.
3 lambda[1] 5.66e-2 4.50e-2 4.42e-2 3.35e-2 9.28e-3 1.44e-1 1.00 2032.
4 lambda[2] 9.08e-2 6.08e-2 9.46e-2 6.14e-2 4.59e-3 2.81e-1 1.00 2187.
5 lambda[3] 8.20e-2 5.53e-2 8.69e-2 5.46e-2 4.36e-3 2.51e-1 1.00 1797.
6 lambda[4] 5.13e-2 4.30e-2 3.57e-2 2.95e-2 1.02e-2 1.20e-1 1.00 1954.
7 lambda[5] 1.23e-1 8.46e-2 1.24e-1 8.53e-2 6.51e-3 3.72e-1 1.00 2260.
8 lambda[6] 1.88e-1 1.55e-1 1.33e-1 1.15e-1 3.45e-2 4.48e-1 1.00 2143.
9 lambda[7] 9.17e-2 6.04e-2 9.49e-2 6.55e-2 4.55e-3 2.79e-1 1.00 2084.
10 lambda[8] 7.54e-2 4.65e-2 8.51e-2 5.04e-2 3.47e-3 2.47e-1 1.00 1429.
# ℹ 12,710 more rows
# ℹ 1 more variable: ess_tail <num>
We have some basic HMC-based validity statistics we can check.
Show code
pnbd_onlineretail_onehierlambda2_stanfit$cmdstan_diagnose()Processing csv files: /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehierlambda2-1.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehierlambda2-2.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehierlambda2-3.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehierlambda2-4.csvWarning: non-fatal error reading adaptation data
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
3.2 Visual Diagnostics of the Sample Validity
Now that we have a sample from the posterior distribution we need to create a few different visualisations of the diagnostics.
Show code
parameter_subset <- c(
"lambda_mn",
"lambda[1]", "lambda[2]", "lambda[3]", "lambda[4]",
"mu[1]", "mu[2]", "mu[3]", "mu[4]"
)
pnbd_onlineretail_onehierlambda2_stanfit$draws(inc_warmup = FALSE) |>
mcmc_trace(pars = parameter_subset) +
expand_limits(y = 0) +
labs(
x = "Iteration",
y = "Value",
title = "Traceplot of Sample of Lambda and Mu Values"
) +
theme(axis.text.x = element_text(size = 10))We also check \(N_{eff}\) as a quick diagnostic of the fit.
Show code
pnbd_onlineretail_onehierlambda2_stanfit |>
neff_ratio(pars = c("lambda", "mu")) |>
as.numeric() |>
mcmc_neff() +
ggtitle("Plot of Parameter Effective Sample Sizes")3.3 Assess the Model
As we intend to run the same logic to assess each of our models, we have combined all this logic into a single function run_model_assessment, to run the simulations and combine the datasets.
Show code
pnbd_stanfit <- pnbd_onlineretail_onehierlambda2_stanfit |>
recover_types(customer_fit_stats_tbl)
pnbd_onlineretail_onehierlambda2_assess_data_lst <- run_model_assessment(
model_stanfit = pnbd_stanfit,
insample_tbl = customer_fit_subset_tbl,
fit_label = "pnbd_onlineretail_onehierlambda2",
fit_end_dttm = use_fit_end_date |> as.POSIXct(),
valid_start_dttm = use_valid_start_date |> as.POSIXct(),
valid_end_dttm = use_valid_end_date |> as.POSIXct(),
sim_seed = 4220
)
pnbd_onlineretail_onehierlambda2_assess_data_lst |> glimpse()List of 5
$ model_fit_index_filepath : 'glue' chr "data/pnbd_onlineretail_onehierlambda2_assess_fit_index_tbl.rds"
$ model_valid_index_filepath : 'glue' chr "data/pnbd_onlineretail_onehierlambda2_assess_valid_index_tbl.rds"
$ model_simstats_filepath : 'glue' chr "data/pnbd_onlineretail_onehierlambda2_assess_model_simstats_tbl.rds"
$ model_fit_simstats_filepath : 'glue' chr "data/pnbd_onlineretail_onehierlambda2_assess_fit_simstats_tbl.rds"
$ model_valid_simstats_filepath: 'glue' chr "data/pnbd_onlineretail_onehierlambda2_assess_valid_simstats_tbl.rds"
3.3.1 Check In-Sample Data Validation
We first check the model against the in-sample data.
Show code
simdata_tbl <- pnbd_onlineretail_onehierlambda2_assess_data_lst |>
use_series(model_fit_simstats_filepath) |>
read_rds()
insample_plots_lst <- create_model_assessment_plots(
obsdata_tbl = obs_fitdata_tbl,
simdata_tbl = simdata_tbl
)
insample_plots_lst$multi_plot |> print()Show code
insample_plots_lst$total_plot |> print()Show code
insample_plots_lst$quant_plot |> print()This fit looks reasonable and appears to capture most of the aspects of the data used to fit it. Given that this is a synthetic dataset, this is not surprising, but at least we appreciate that our model is valid.
3.3.2 Check Out-of-Sample Data Validation
We now repeat for the out-of-sample data.
Show code
simdata_tbl <- pnbd_onlineretail_onehierlambda1_assess_data_lst |>
use_series(model_valid_simstats_filepath) |>
read_rds()
outsample_plots_lst <- create_model_assessment_plots(
obsdata_tbl = obs_validdata_tbl,
simdata_tbl = simdata_tbl
)
outsample_plots_lst$multi_plot |> print()Show code
outsample_plots_lst$total_plot |> print()Show code
outsample_plots_lst$quant_plot |> print()As for our short time frame data, overall our model is working well.
4 Fit First Hierarchical Mu Model
We now construct the same hierarchical model but based around mu_mn.
4.1 Compile and Fit Stan Model
We compile this model using CmdStanR.
Show code
pnbd_onehiermu_stanmodel <- cmdstan_model(
"stan_code/pnbd_onehier_mu.stan",
include_paths = stan_codedir,
pedantic = TRUE,
dir = stan_modeldir
)We then use this compiled model with our data to produce a fit of the data.
Show code
stan_modelname <- "pnbd_onlineretail_onehiermu1"
stanfit_prefix <- str_c("fit_", stan_modelname)
stanfit_seed <- stanfit_seed + 1
stanfit_object_file <- glue("data/{stanfit_prefix}_stanfit.rds")
stan_data_lst <- customer_fit_stats_tbl |>
select(customer_id, x, t_x, T_cal) |>
compose_data(
hier_mu_mn_p1 = 0.50,
hier_mu_mn_p2 = 1.00,
lambda_mn = 0.25,
lambda_cv = 1.00,
mu_cv = 1.00
)
if(!file_exists(stanfit_object_file)) {
pnbd_onlineretail_onehiermu1_stanfit <- pnbd_onehiermu_stanmodel$sample(
data = stan_data_lst,
chains = 4,
iter_warmup = 500,
iter_sampling = 500,
seed = stanfit_seed,
save_warmup = TRUE,
output_dir = stan_modeldir,
output_basename = stanfit_prefix,
)
pnbd_onlineretail_onehiermu1_stanfit$save_object(stanfit_object_file, compress = "gzip")
} else {
pnbd_onlineretail_onehiermu1_stanfit <- read_rds(stanfit_object_file)
}Running MCMC with 4 chains, at most 12 in parallel...
Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 3 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 4 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 3 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 4 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 3 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 3 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 3 finished in 104.9 seconds.
Chain 4 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 2 finished in 105.0 seconds.
Chain 4 finished in 105.0 seconds.
Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1 finished in 128.4 seconds.
All 4 chains finished successfully.
Mean chain execution time: 110.8 seconds.
Total execution time: 128.5 seconds.
Show code
pnbd_onlineretail_onehiermu1_stanfit$summary()# A tibble: 12,720 × 10
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <num> <num> <num> <num> <num> <num> <num> <num>
1 lp__ -5.72e+4 -5.72e+4 7.21e+1 7.17e+1 -5.73e+4 -5.71e+4 1.00 780.
2 mu_mn 4.46e-3 4.46e-3 4.45e-4 4.67e-4 3.76e-3 5.18e-3 1.03 108.
3 lambda[1] 5.08e-2 4.21e-2 3.69e-2 3.02e-2 9.18e-3 1.19e-1 1.00 2625.
4 lambda[2] 1.23e-1 8.24e-2 1.22e-1 8.40e-2 7.73e-3 3.75e-1 1.00 1881.
5 lambda[3] 8.02e-2 5.25e-2 8.86e-2 5.35e-2 3.96e-3 2.49e-1 0.999 2977.
6 lambda[4] 5.71e-2 4.77e-2 4.06e-2 3.50e-2 9.08e-3 1.37e-1 1.00 2338.
7 lambda[5] 2.38e-1 1.70e-1 2.44e-1 1.73e-1 1.02e-2 6.92e-1 1.00 1626.
8 lambda[6] 2.94e-1 2.44e-1 2.10e-1 1.82e-1 5.18e-2 7.22e-1 1.00 1613.
9 lambda[7] 1.18e-1 7.71e-2 1.35e-1 7.99e-2 6.71e-3 3.48e-1 1.00 1894.
10 lambda[8] 5.11e-2 2.70e-2 7.90e-2 2.83e-2 1.82e-3 1.78e-1 1.00 2051.
# ℹ 12,710 more rows
# ℹ 1 more variable: ess_tail <num>
We have some basic HMC-based validity statistics we can check.
Show code
pnbd_onlineretail_onehiermu1_stanfit$cmdstan_diagnose()Processing csv files: /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehiermu1-1.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehiermu1-2.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehiermu1-3.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehiermu1-4.csvWarning: non-fatal error reading adaptation data
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
Split R-hat values satisfactory all parameters.
Processing complete, no problems detected.
4.2 Visual Diagnostics of the Sample Validity
Now that we have a sample from the posterior distribution we need to create a few different visualisations of the diagnostics.
Show code
parameter_subset <- c(
"mu_mn",
"lambda[1]", "lambda[2]", "lambda[3]", "lambda[4]",
"mu[1]", "mu[2]", "mu[3]", "mu[4]"
)
pnbd_onlineretail_onehiermu1_stanfit$draws(inc_warmup = FALSE) |>
mcmc_trace(pars = parameter_subset) +
expand_limits(y = 0) +
labs(
x = "Iteration",
y = "Value",
title = "Traceplot of Sample of Lambda and Mu Values"
) +
theme(axis.text.x = element_text(size = 10))We also check \(N_{eff}\) as a quick diagnostic of the fit.
Show code
pnbd_onlineretail_onehiermu1_stanfit |>
neff_ratio(pars = c("lambda", "mu")) |>
as.numeric() |>
mcmc_neff() +
ggtitle("Plot of Parameter Effective Sample Sizes")4.3 Assess the Model
As we intend to run the same logic to assess each of our models, we have combined all this logic into a single function run_model_assessment, to run the simulations and combine the datasets.
Show code
pnbd_stanfit <- pnbd_onlineretail_onehiermu1_stanfit |>
recover_types(customer_fit_stats_tbl)
pnbd_onlineretail_onehiermu1_assess_data_lst <- run_model_assessment(
model_stanfit = pnbd_stanfit,
insample_tbl = customer_fit_subset_tbl,
fit_label = "pnbd_onlineretail_onehiermu1",
fit_end_dttm = use_fit_end_date |> as.POSIXct(),
valid_start_dttm = use_valid_start_date |> as.POSIXct(),
valid_end_dttm = use_valid_end_date |> as.POSIXct(),
sim_seed = 4230
)
pnbd_onlineretail_onehiermu1_assess_data_lst |> glimpse()List of 5
$ model_fit_index_filepath : 'glue' chr "data/pnbd_onlineretail_onehiermu1_assess_fit_index_tbl.rds"
$ model_valid_index_filepath : 'glue' chr "data/pnbd_onlineretail_onehiermu1_assess_valid_index_tbl.rds"
$ model_simstats_filepath : 'glue' chr "data/pnbd_onlineretail_onehiermu1_assess_model_simstats_tbl.rds"
$ model_fit_simstats_filepath : 'glue' chr "data/pnbd_onlineretail_onehiermu1_assess_fit_simstats_tbl.rds"
$ model_valid_simstats_filepath: 'glue' chr "data/pnbd_onlineretail_onehiermu1_assess_valid_simstats_tbl.rds"
4.3.1 Check In-Sample Data Validation
We first check the model against the in-sample data.
Show code
simdata_tbl <- pnbd_onlineretail_onehiermu1_assess_data_lst |>
use_series(model_fit_simstats_filepath) |>
read_rds()
insample_plots_lst <- create_model_assessment_plots(
obsdata_tbl = obs_fitdata_tbl,
simdata_tbl = simdata_tbl
)
insample_plots_lst$multi_plot |> print()Show code
insample_plots_lst$total_plot |> print()Show code
insample_plots_lst$quant_plot |> print()This fit looks reasonable and appears to capture most of the aspects of the data used to fit it. Given that this is a synthetic dataset, this is not surprising, but at least we appreciate that our model is valid.
4.3.2 Check Out-of-Sample Data Validation
We now repeat for the out-of-sample data.
Show code
simdata_tbl <- pnbd_onlineretail_onehierlambda1_assess_data_lst |>
use_series(model_valid_simstats_filepath) |>
read_rds()
outsample_plots_lst <- create_model_assessment_plots(
obsdata_tbl = obs_validdata_tbl,
simdata_tbl = simdata_tbl
)
outsample_plots_lst$multi_plot |> print()Show code
outsample_plots_lst$total_plot |> print()Show code
outsample_plots_lst$quant_plot |> print()As for our short time frame data, overall our model is working well.
5 Fit Second Hierarchical Mu Model
In this model, we are going with a broadly similar model but we are instead using a different mean for our hierarchical prior.
5.1 Fit Stan Model
We now want to fit the model to our data using this alternative prior for lambda_mn.
Show code
stan_modelname <- "pnbd_onlineretail_onehiermu2"
stanfit_prefix <- str_c("fit_", stan_modelname)
stanfit_seed <- stanfit_seed + 1
stanfit_object_file <- glue("data/{stanfit_prefix}_stanfit.rds")
stan_data_lst <- customer_fit_stats_tbl |>
select(customer_id, x, t_x, T_cal) |>
compose_data(
hier_mu_mn_p1 = 0.25,
hier_mu_mn_p2 = 1.00,
lambda_mn = 0.25,
lambda_cv = 1.00,
mu_cv = 1.00
)
if(!file_exists(stanfit_object_file)) {
pnbd_onlineretail_onehiermu2_stanfit <- pnbd_onehiermu_stanmodel$sample(
data = stan_data_lst,
chains = 4,
iter_warmup = 500,
iter_sampling = 500,
seed = stanfit_seed,
save_warmup = TRUE,
output_dir = stan_modeldir,
output_basename = stanfit_prefix,
)
pnbd_onlineretail_onehiermu2_stanfit$save_object(stanfit_object_file, compress = "gzip")
} else {
pnbd_onlineretail_onehiermu2_stanfit <- read_rds(stanfit_object_file)
}Running MCMC with 4 chains, at most 12 in parallel...
Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 3 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 4 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 3 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 4 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 3 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 2 finished in 114.4 seconds.
Chain 1 finished in 114.5 seconds.
Chain 3 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 3 finished in 115.1 seconds.
Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 4 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 4 finished in 135.9 seconds.
All 4 chains finished successfully.
Mean chain execution time: 120.0 seconds.
Total execution time: 136.4 seconds.
Show code
pnbd_onlineretail_onehiermu2_stanfit$summary()# A tibble: 12,720 × 10
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <num> <num> <num> <num> <num> <num> <num> <num>
1 lp__ -5.72e+4 -5.72e+4 7.28e+1 7.46e+1 -5.73e+4 -5.71e+4 1.01 631.
2 mu_mn 4.42e-3 4.44e-3 4.59e-4 4.91e-4 3.67e-3 5.18e-3 1.10 46.7
3 lambda[1] 5.08e-2 4.13e-2 3.90e-2 2.98e-2 8.58e-3 1.27e-1 1.00 1483.
4 lambda[2] 1.17e-1 8.22e-2 1.13e-1 8.27e-2 6.18e-3 3.38e-1 1.00 1380.
5 lambda[3] 7.82e-2 5.30e-2 9.09e-2 5.40e-2 4.17e-3 2.38e-1 1.00 1674.
6 lambda[4] 5.75e-2 4.71e-2 4.06e-2 3.52e-2 9.63e-3 1.37e-1 1.00 1651.
7 lambda[5] 2.31e-1 1.56e-1 2.36e-1 1.64e-1 9.72e-3 7.22e-1 1.00 1797.
8 lambda[6] 2.95e-1 2.56e-1 2.06e-1 1.84e-1 4.90e-2 7.02e-1 1.00 1754.
9 lambda[7] 1.16e-1 7.77e-2 1.23e-1 8.42e-2 4.72e-3 3.54e-1 1.00 1591.
10 lambda[8] 4.73e-2 2.63e-2 7.29e-2 2.81e-2 2.02e-3 1.52e-1 1.00 2175.
# ℹ 12,710 more rows
# ℹ 1 more variable: ess_tail <num>
We have some basic HMC-based validity statistics we can check.
Show code
pnbd_onlineretail_onehiermu2_stanfit$cmdstan_diagnose()Processing csv files: /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehiermu2-1.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehiermu2-2.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehiermu2-3.csvWarning: non-fatal error reading adaptation data
, /home/rstudio/btydwork/stan_models/fit_pnbd_onlineretail_onehiermu2-4.csvWarning: non-fatal error reading adaptation data
Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.
Checking sampler transitions for divergences.
No divergent transitions found.
Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.
Effective sample size satisfactory.
The following parameters had split R-hat greater than 1.05:
mu_mn, beta
Such high values indicate incomplete mixing and biased estimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.
Processing complete.
5.2 Visual Diagnostics of the Sample Validity
Now that we have a sample from the posterior distribution we need to create a few different visualisations of the diagnostics.
Show code
parameter_subset <- c(
"mu_mn",
"lambda[1]", "lambda[2]", "lambda[3]", "lambda[4]",
"mu[1]", "mu[2]", "mu[3]", "mu[4]"
)
pnbd_onlineretail_onehiermu2_stanfit$draws(inc_warmup = FALSE) |>
mcmc_trace(pars = parameter_subset) +
expand_limits(y = 0) +
labs(
x = "Iteration",
y = "Value",
title = "Traceplot of Sample of Lambda and Mu Values"
) +
theme(axis.text.x = element_text(size = 10))We also check \(N_{eff}\) as a quick diagnostic of the fit.
Show code
pnbd_onlineretail_onehiermu2_stanfit |>
neff_ratio(pars = c("lambda", "mu")) |>
as.numeric() |>
mcmc_neff() +
ggtitle("Plot of Parameter Effective Sample Sizes")5.3 Assess the Model
As we intend to run the same logic to assess each of our models, we have combined all this logic into a single function run_model_assessment, to run the simulations and combine the datasets.
Show code
pnbd_stanfit <- pnbd_onlineretail_onehiermu2_stanfit |>
recover_types(customer_fit_stats_tbl)
pnbd_onlineretail_onehiermu2_assess_data_lst <- run_model_assessment(
model_stanfit = pnbd_stanfit,
insample_tbl = customer_fit_subset_tbl,
fit_label = "pnbd_onlineretail_onehiermu2",
fit_end_dttm = use_fit_end_date |> as.POSIXct(),
valid_start_dttm = use_valid_start_date |> as.POSIXct(),
valid_end_dttm = use_valid_end_date |> as.POSIXct(),
sim_seed = 4240
)
pnbd_onlineretail_onehiermu2_assess_data_lst |> glimpse()List of 5
$ model_fit_index_filepath : 'glue' chr "data/pnbd_onlineretail_onehiermu2_assess_fit_index_tbl.rds"
$ model_valid_index_filepath : 'glue' chr "data/pnbd_onlineretail_onehiermu2_assess_valid_index_tbl.rds"
$ model_simstats_filepath : 'glue' chr "data/pnbd_onlineretail_onehiermu2_assess_model_simstats_tbl.rds"
$ model_fit_simstats_filepath : 'glue' chr "data/pnbd_onlineretail_onehiermu2_assess_fit_simstats_tbl.rds"
$ model_valid_simstats_filepath: 'glue' chr "data/pnbd_onlineretail_onehiermu2_assess_valid_simstats_tbl.rds"
5.3.1 Check In-Sample Data Validation
We first check the model against the in-sample data.
Show code
simdata_tbl <- pnbd_onlineretail_onehiermu2_assess_data_lst |>
use_series(model_fit_simstats_filepath) |>
read_rds()
insample_plots_lst <- create_model_assessment_plots(
obsdata_tbl = obs_fitdata_tbl,
simdata_tbl = simdata_tbl
)
insample_plots_lst$multi_plot |> print()Show code
insample_plots_lst$total_plot |> print()Show code
insample_plots_lst$quant_plot |> print()This fit looks reasonable and appears to capture most of the aspects of the data used to fit it. Given that this is a synthetic dataset, this is not surprising, but at least we appreciate that our model is valid.
5.3.2 Check Out-of-Sample Data Validation
We now repeat for the out-of-sample data.
Show code
simdata_tbl <- pnbd_onlineretail_onehiermu1_assess_data_lst |>
use_series(model_valid_simstats_filepath) |>
read_rds()
outsample_plots_lst <- create_model_assessment_plots(
obsdata_tbl = obs_validdata_tbl,
simdata_tbl = simdata_tbl
)
outsample_plots_lst$multi_plot |> print()Show code
outsample_plots_lst$total_plot |> print()Show code
outsample_plots_lst$quant_plot |> print()As for our short time frame data, overall our model is working well.
6 Compare Model Outputs
We have looked at each of the models individually, but it is also worth looking at each of the models as a group.
Show code
calculate_simulation_statistics <- function(file_rds) {
simdata_tbl <- read_rds(file_rds)
multicount_cust_tbl <- simdata_tbl |>
filter(sim_tnx_count > 0) |>
count(draw_id, name = "multicust_count")
totaltnx_data_tbl <- simdata_tbl |>
count(draw_id, wt = sim_tnx_count, name = "simtnx_count")
simstats_tbl <- multicount_cust_tbl |>
inner_join(totaltnx_data_tbl, by = "draw_id")
return(simstats_tbl)
}Show code
obs_fit_customer_count <- obs_fitdata_tbl |>
filter(tnx_count > 0) |>
nrow()
obs_valid_customer_count <- obs_validdata_tbl |>
filter(tnx_count > 0) |>
nrow()
obs_fit_total_count <- obs_fitdata_tbl |>
pull(tnx_count) |>
sum()
obs_valid_total_count <- obs_validdata_tbl |>
pull(tnx_count) |>
sum()
obs_stats_tbl <- tribble(
~assess_type, ~name, ~obs_value,
"fit", "multicust_count", obs_fit_customer_count,
"fit", "simtnx_count", obs_fit_total_count,
"valid", "multicust_count", obs_valid_customer_count,
"valid", "simtnx_count", obs_valid_total_count
)
model_assess_tbl <- dir_ls("data", regexp = "pnbd_onlineretail_(one|fixed).*_assess_.*simstats") |>
enframe(name = NULL, value = "file_path") |>
filter(str_detect(file_path, "_assess_model_", negate = TRUE)) |>
mutate(
model_label = str_replace(file_path, "data/pnbd_onlineretail_(.*?)_assess_.*", "\\1"),
assess_type = if_else(str_detect(file_path, "_assess_fit_"), "fit", "valid"),
sim_data = map(
file_path, calculate_simulation_statistics,
.progress = "calculate_simulation_statistics"
)
)
model_assess_tbl |> glimpse()Rows: 16
Columns: 4
$ file_path <fs::path> "data/pnbd_onlineretail_fixed1_assess_fit_simstats_tb…
$ model_label <chr> "fixed1", "fixed1", "fixed2", "fixed2", "fixed3", "fixed3"…
$ assess_type <chr> "fit", "valid", "fit", "valid", "fit", "valid", "fit", "va…
$ sim_data <list> [<tbl_df[2000 x 3]>], [<tbl_df[2000 x 3]>], [<tbl_df[2000…
Show code
model_assess_summstat_tbl <- model_assess_tbl |>
select(model_label, assess_type, sim_data) |>
unnest(sim_data) |>
pivot_longer(
cols = !c(model_label, assess_type, draw_id)
) |>
group_by(model_label, assess_type, name) |>
summarise(
.groups = "drop",
mean_val = mean(value),
p10 = quantile(value, 0.10),
p25 = quantile(value, 0.25),
p50 = quantile(value, 0.50),
p75 = quantile(value, 0.75),
p90 = quantile(value, 0.90)
)
model_assess_summstat_tbl |> glimpse()Rows: 32
Columns: 9
$ model_label <chr> "fixed1", "fixed1", "fixed1", "fixed1", "fixed2", "fixed2"…
$ assess_type <chr> "fit", "fit", "valid", "valid", "fit", "fit", "valid", "va…
$ name <chr> "multicust_count", "simtnx_count", "multicust_count", "sim…
$ mean_val <dbl> 50.2350, 249.3000, 53.1105, 928.2575, 56.0095, 222.9570, 3…
$ p10 <dbl> 45.0, 204.0, 48.0, 806.9, 50.0, 187.0, 35.0, 410.0, 44.0, …
$ p25 <dbl> 48.00, 225.00, 51.00, 863.00, 53.00, 203.00, 37.00, 450.00…
$ p50 <dbl> 50.0, 249.0, 53.0, 925.5, 56.0, 222.0, 40.0, 500.0, 50.0, …
$ p75 <dbl> 53.00, 272.00, 56.00, 992.25, 59.00, 242.00, 42.00, 549.00…
$ p90 <dbl> 55.0, 295.0, 58.0, 1053.0, 61.0, 260.0, 44.0, 597.1, 55.0,…
Show code
#! echo: TRUE
ggplot(model_assess_summstat_tbl) +
geom_errorbar(
aes(x = model_label, ymin = p10, ymax = p90), width = 0
) +
geom_errorbar(
aes(x = model_label, ymin = p25, ymax = p75), width = 0, linewidth = 3
) +
geom_hline(
aes(yintercept = obs_value),
data = obs_stats_tbl, colour = "red"
) +
scale_y_continuous(labels = label_comma()) +
expand_limits(y = 0) +
facet_wrap(
vars(assess_type, name), scale = "free_y"
) +
labs(
x = "Model",
y = "Count",
title = "Comparison Plot for the Different Models"
) +
theme(
axis.text.x = element_text(angle = 20, vjust = 0.5, size = 8)
)6.1 Write Assessment Data to Disk
We now want to save the assessment data to disk.
Show code
model_assess_tbl |> write_rds("data/assess_data_pnbd_onlineretail_onehier_tbl.rds")7 R Environment
Show code
options(width = 120L)
sessioninfo::session_info()─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.2.3 (2023-03-15)
os Ubuntu 22.04.2 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Dublin
date 2023-07-24
pandoc 2.19.2 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [1] RSPM (R 4.2.0)
arrayhelpers 1.1-0 2020-02-04 [1] RSPM (R 4.2.0)
backports 1.4.1 2021-12-13 [1] RSPM (R 4.2.0)
base64enc 0.1-3 2015-07-28 [1] RSPM (R 4.2.0)
bayesplot * 1.10.0 2022-11-16 [1] RSPM (R 4.2.0)
bit 4.0.5 2022-11-15 [1] RSPM (R 4.2.0)
bit64 4.0.5 2020-08-30 [1] RSPM (R 4.2.0)
boot 1.3-28.1 2022-11-22 [2] CRAN (R 4.2.3)
bridgesampling 1.1-2 2021-04-16 [1] RSPM (R 4.2.0)
brms * 2.19.0 2023-03-14 [1] RSPM (R 4.2.0)
Brobdingnag 1.2-9 2022-10-19 [1] RSPM (R 4.2.0)
cachem 1.0.7 2023-02-24 [1] RSPM (R 4.2.0)
callr 3.7.3 2022-11-02 [1] RSPM (R 4.2.0)
checkmate 2.1.0 2022-04-21 [1] RSPM (R 4.2.0)
cli 3.6.1 2023-03-23 [1] RSPM (R 4.2.0)
cmdstanr * 0.5.3 2023-07-21 [1] Github (stan-dev/cmdstanr@22b391e)
coda 0.19-4 2020-09-30 [1] RSPM (R 4.2.0)
codetools 0.2-19 2023-02-01 [2] CRAN (R 4.2.3)
colorspace 2.1-0 2023-01-23 [1] RSPM (R 4.2.0)
colourpicker 1.2.0 2022-10-28 [1] RSPM (R 4.2.0)
conflicted * 1.2.0 2023-02-01 [1] RSPM (R 4.2.0)
cowplot * 1.1.1 2020-12-30 [1] RSPM (R 4.2.0)
crayon 1.5.2 2022-09-29 [1] RSPM (R 4.2.0)
crosstalk 1.2.0 2021-11-04 [1] RSPM (R 4.2.0)
data.table 1.14.8 2023-02-17 [1] RSPM (R 4.2.0)
digest 0.6.31 2022-12-11 [1] RSPM (R 4.2.0)
directlabels * 2021.1.13 2021-01-16 [1] RSPM (R 4.2.0)
distributional 0.3.2 2023-03-22 [1] RSPM (R 4.2.0)
dplyr * 1.1.1 2023-03-22 [1] RSPM (R 4.2.0)
DT 0.27 2023-01-17 [1] RSPM (R 4.2.0)
dygraphs 1.1.1.6 2018-07-11 [1] RSPM (R 4.2.0)
ellipsis 0.3.2 2021-04-29 [1] RSPM (R 4.2.0)
evaluate 0.20 2023-01-17 [1] RSPM (R 4.2.0)
fansi 1.0.4 2023-01-22 [1] RSPM (R 4.2.0)
farver 2.1.1 2022-07-06 [1] RSPM (R 4.2.0)
fastmap 1.1.1 2023-02-24 [1] RSPM (R 4.2.0)
forcats * 1.0.0 2023-01-29 [1] RSPM (R 4.2.0)
fs * 1.6.1 2023-02-06 [1] RSPM (R 4.2.0)
furrr * 0.3.1 2022-08-15 [1] RSPM (R 4.2.0)
future * 1.32.0 2023-03-07 [1] RSPM (R 4.2.0)
gamm4 0.2-6 2020-04-03 [1] RSPM (R 4.2.0)
generics 0.1.3 2022-07-05 [1] RSPM (R 4.2.0)
ggdist 3.2.1 2023-01-18 [1] RSPM (R 4.2.0)
ggplot2 * 3.4.2 2023-04-03 [1] RSPM (R 4.2.0)
globals 0.16.2 2022-11-21 [1] RSPM (R 4.2.0)
glue * 1.6.2 2022-02-24 [1] RSPM (R 4.2.0)
gridExtra 2.3 2017-09-09 [1] RSPM (R 4.2.0)
gtable 0.3.3 2023-03-21 [1] RSPM (R 4.2.0)
gtools 3.9.4 2022-11-27 [1] RSPM (R 4.2.0)
hms 1.1.3 2023-03-21 [1] RSPM (R 4.2.0)
htmltools 0.5.5 2023-03-23 [1] RSPM (R 4.2.0)
htmlwidgets 1.6.2 2023-03-17 [1] RSPM (R 4.2.0)
httpuv 1.6.9 2023-02-14 [1] RSPM (R 4.2.0)
igraph 1.4.2 2023-04-07 [1] RSPM (R 4.2.0)
inline 0.3.19 2021-05-31 [1] RSPM (R 4.2.0)
jsonlite 1.8.4 2022-12-06 [1] RSPM (R 4.2.0)
knitr 1.42 2023-01-25 [1] RSPM (R 4.2.0)
labeling 0.4.2 2020-10-20 [1] RSPM (R 4.2.0)
later 1.3.0 2021-08-18 [1] RSPM (R 4.2.0)
lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.3)
lifecycle 1.0.3 2022-10-07 [1] RSPM (R 4.2.0)
listenv 0.9.0 2022-12-16 [1] RSPM (R 4.2.0)
lme4 1.1-32 2023-03-14 [1] RSPM (R 4.2.0)
loo 2.6.0 2023-03-31 [1] RSPM (R 4.2.0)
lubridate * 1.9.2 2023-02-10 [1] RSPM (R 4.2.0)
magrittr * 2.0.3 2022-03-30 [1] RSPM (R 4.2.0)
markdown 1.6 2023-04-07 [1] RSPM (R 4.2.0)
MASS 7.3-58.2 2023-01-23 [2] CRAN (R 4.2.3)
Matrix 1.5-3 2022-11-11 [2] CRAN (R 4.2.3)
matrixStats 0.63.0 2022-11-18 [1] RSPM (R 4.2.0)
memoise 2.0.1 2021-11-26 [1] RSPM (R 4.2.0)
mgcv 1.8-42 2023-03-02 [2] CRAN (R 4.2.3)
mime 0.12 2021-09-28 [1] RSPM (R 4.2.0)
miniUI 0.1.1.1 2018-05-18 [1] RSPM (R 4.2.0)
minqa 1.2.5 2022-10-19 [1] RSPM (R 4.2.0)
munsell 0.5.0 2018-06-12 [1] RSPM (R 4.2.0)
mvtnorm 1.1-3 2021-10-08 [1] RSPM (R 4.2.0)
nlme 3.1-162 2023-01-31 [2] CRAN (R 4.2.3)
nloptr 2.0.3 2022-05-26 [1] RSPM (R 4.2.0)
parallelly 1.35.0 2023-03-23 [1] RSPM (R 4.2.0)
pillar 1.9.0 2023-03-22 [1] RSPM (R 4.2.0)
pkgbuild 1.4.0 2022-11-27 [1] RSPM (R 4.2.0)
pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.2.0)
plyr 1.8.8 2022-11-11 [1] RSPM (R 4.2.0)
posterior * 1.4.1 2023-03-14 [1] RSPM (R 4.2.0)
prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.2.0)
processx 3.8.1 2023-04-18 [1] RSPM (R 4.2.0)
projpred 2.5.0 2023-04-05 [1] RSPM (R 4.2.0)
promises 1.2.0.1 2021-02-11 [1] RSPM (R 4.2.0)
ps 1.7.5 2023-04-18 [1] RSPM (R 4.2.0)
purrr * 1.0.1 2023-01-10 [1] RSPM (R 4.2.0)
quadprog 1.5-8 2019-11-20 [1] RSPM (R 4.2.0)
R6 2.5.1 2021-08-19 [1] RSPM (R 4.2.0)
Rcpp * 1.0.10 2023-01-22 [1] RSPM (R 4.2.0)
RcppParallel 5.1.7 2023-02-27 [1] RSPM (R 4.2.0)
readr * 2.1.4 2023-02-10 [1] RSPM (R 4.2.0)
reshape2 1.4.4 2020-04-09 [1] RSPM (R 4.2.0)
rlang * 1.1.0 2023-03-14 [1] RSPM (R 4.2.0)
rmarkdown 2.21 2023-03-26 [1] RSPM (R 4.2.0)
rstan 2.21.8 2023-01-17 [1] RSPM (R 4.2.0)
rstantools 2.3.1 2023-03-30 [1] RSPM (R 4.2.0)
rsyslog * 1.0.2 2021-06-04 [1] RSPM (R 4.2.0)
scales * 1.2.1 2022-08-20 [1] RSPM (R 4.2.0)
sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.2.0)
shiny 1.7.4 2022-12-15 [1] RSPM (R 4.2.0)
shinyjs 2.1.0 2021-12-23 [1] RSPM (R 4.2.0)
shinystan 2.6.0 2022-03-03 [1] RSPM (R 4.2.0)
shinythemes 1.2.0 2021-01-25 [1] RSPM (R 4.2.0)
StanHeaders 2.21.0-7 2020-12-17 [1] RSPM (R 4.2.0)
stringi 1.7.12 2023-01-11 [1] RSPM (R 4.2.0)
stringr * 1.5.0 2022-12-02 [1] RSPM (R 4.2.0)
svUnit 1.0.6 2021-04-19 [1] RSPM (R 4.2.0)
tensorA 0.36.2 2020-11-19 [1] RSPM (R 4.2.0)
threejs 0.3.3 2020-01-21 [1] RSPM (R 4.2.0)
tibble * 3.2.1 2023-03-20 [1] RSPM (R 4.2.0)
tidybayes * 3.0.4 2023-03-14 [1] RSPM (R 4.2.0)
tidyr * 1.3.0 2023-01-24 [1] RSPM (R 4.2.0)
tidyselect 1.2.0 2022-10-10 [1] RSPM (R 4.2.0)
tidyverse * 2.0.0 2023-02-22 [1] RSPM (R 4.2.0)
timechange 0.2.0 2023-01-11 [1] RSPM (R 4.2.0)
tzdb 0.3.0 2022-03-28 [1] RSPM (R 4.2.0)
utf8 1.2.3 2023-01-31 [1] RSPM (R 4.2.0)
vctrs 0.6.2 2023-04-19 [1] RSPM (R 4.2.0)
vroom 1.6.1 2023-01-22 [1] RSPM (R 4.2.0)
withr 2.5.0 2022-03-03 [1] RSPM (R 4.2.0)
xfun 0.38 2023-03-24 [1] RSPM (R 4.2.0)
xtable 1.8-4 2019-04-21 [1] RSPM (R 4.2.0)
xts 0.13.1 2023-04-16 [1] RSPM (R 4.2.0)
yaml 2.3.7 2023-01-23 [1] RSPM (R 4.2.0)
zoo 1.8-12 2023-04-13 [1] RSPM (R 4.2.0)
[1] /usr/local/lib/R/site-library
[2] /usr/local/lib/R/library
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Show code
options(width = 80L)